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ABSTRACT 


The adoption of educational technologies such as e-textbook has 
offered a new opportunity to gain insight into teachers' usage of 
ICT (Information and Communication Technologies). In the e- 
textbook platform, customized digital products and the learning 
activities organized in digital environment require teachers to 
make greater efforts in planning lessons and producing resources. 
In addition, usage of technology can vary greatly from one group 
of teachers from another in various contexts. In this study, we 
demonstrate how computations like event segmentation and 
contextual numbers can be exploited in visualizing trajectories of 
teacher’s ICT usage. We also study with the experience structure 
via the implicit patterns within the raw data of an e-textbook 
platform. Such automated visual characterization might be helpful 
to the wide and scalable application of teaching analytics to 
represent teacher’s ICT usage. 
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1. INTRODUCTION 


Information and Communication Technologies (ICT) are 
becoming increasingly pervasive in education [12] and are making 
a difference in the ways teacher plan lesson and organize activities 
[13]. It is also well documented that teachers need support to 
make effective use of information technology in their teaching, 
because the incorporation of ICT is not easy process which 
involves many technical complexities [10]. With a goal of better 
use of ICT, teaching analytics is conceived as an analytics 
approach that focuses on the design, development, evaluation of 
visual analytics methods and tools for teachers [20]. 


However, the crucial step of supporting teacher interventions 
based on learning analytics insights remains under-supported [17]. 
As it often happens elsewhere in learning analytics, most learning 
environments are not designed for data analysis and mining [8], 
even if they do analysis, they are designed to focus on analyzing 
student learning or behavior and provide feedback to the teacher 
[1,20], not to analyze and represent the teacher’s data they store. 
Therefore, many studies depict learning analytics for teachers 
rather than analytics about teaching [17]. 


In addition, although much work has been done on visualizing 
analytics result, their design and use is less understood, which can 
lead to the weak implementation as a result of promoting 
ineffective feedback [21,19,3]. In many cases, however, it is not 
easy to compare the complex objects over high dimension 
visualization which requires users to understand the semantics of 
visual representation and feature that are assumed by model and 


algorithm. Besides that, some visualization approaches present the 
narrow scope of the representation, as focused on one snapshot of 
a certain topic of data for a certain period time. It usually did 
represent several aspects of dataset that occurring within the 
environments but did not represent the nature of connections 
inside the datasets and provide a global view of usage [2]. As a 
consequence, the application of dashboards requires additional 
information processing in various work. 


The purpose of this study is to design a computational procedure 
based on behavior data with the intent to create a visualization of 
trajectory that will help describe teacher ICT usage. 


To explore these issues, we make a case study in which the data is 
gathered from an e-textbook server without any additional sensor 
or APIs. In previous study [23], we found that a segmentation 
method is effective in effort to provide features distilled for 
predicting e-textbook adoption in early days. In this study, we 
bring together event segmentation and one-dimension Se/f- 
organizing map to integrate an authentic teaching experience 
involving digital environment with embedded robust and 
continuous characterizing of ICT usage trajectories. The raw data 
records which were created in a e-textbook platform will be 
computationally transformed and displayed, so that teachers and 
other stakeholders can utilize the information of result of 
contextual visualization to get insights and improve dynamic and 
diagnostic decision-making. 


2. DATA 


We investigated issues within the context of data from an e- 
textbook platform named ZoomClass. ZoomClass includes a web- 
based authoring environment and an iPad application for teachers. 
Teachers were given access to customize all digital content for 
specific teaching objectives. They typically create courses, upload 
media resources and products which are mostly customized by 
themselves in other tools (such as PowerPoint), design tasks, 
assign activities, and insert quizzes on the web-based environment 
before class. Also, they can record and upload photos and videos 
by iPad application. The users of ZoomClass are teachers and 
students at a primary school of Shanghai. We obtained data on 
teacher authoring action records and student response action 
records, for 110 teachers enrolled in this e-textbook platform, 
observed over more than 5 semesters since 2014 October. Until 
January 2017, the teachers have performed a total of 117,324 
actions, created 4,653 courses, uploaded 16, 901 digital resource 
included almost 9,000 image products and get 3,364,533 
responses from students. 
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Figure 1. The iPad application Zoom Class 
3. METHOD 


In this study, we bring together an event segmentation algorithm 
and a nonparametric mapping which is called contextual numbers, 
to integrate an authentic teaching experience involving e-textbook 
platform with embedded continuous characterizing of ICT usage 
trajectories. In general, the intent of event segmentation is to 
determine how a threshold should be set automatically when 
partitioning action streams into usage feature spaces. And the 
approach of contextual numbers is used to map the high 
dimensional space of usage to a continuous one-dimensional 
numerical field, which are ordered in the given context, similar 
numbers refer to similar high dimensional states of usage. Figure 
2 shows the computational procedure and associated steps, which 
will be discussed in detail in this section. 
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based on event 
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Figure 2. The computational procedure and associated steps 


3.1 Event Segmentation 

In our study, data comes from the raw records of an e-textbook 
platform. Two characteristics of this data are contained: 1. Data 
only recorded by the back-end server without any sensor 
embedded in front-end, that means the grain size of our data is 
much bigger when comparing with the sensor data (such as 
clickstream); 2. Multi-platform operation, which would cause the 
break off of data capturing when teacher transfer to another 


platform. Thus, these two problems lead to an amount of missing 
action data among our data set. In considering of this issue, an 
event segmentation method is introduced to transform action 
records to event dataset. 


Event segmentation is a method means dividing a given number 
of observation into subsets with statistical characteristics that are 
similar within each subsets and different between subsets [4]. In 
this study, the goal of event segmentation is to automatically 
partition teacher actions into separate events, the segmentation 
method is only based the date time information of server log 
records. We consider action records in chronological order such 
that 


R = {Ry,.., Rm} (1) 


where R; is the ith action record in data set R with length m. A 
event segment e; ; which is a subset of R can be given as 


en; = {Ri Rj} 1<isj<m (2) 


Intuitively, the time differences between inter-action records in an 
event are typically smaller than time differences between inter- 
action records from separate events, so the time intervals between 
observations are often considered as a criterion to judge 
partitioning [11]. 


With respect to the fact that teachers with various contexts have 
different usage of e-textbook, it is very likely that teachers 
perform diverse action frequencies during different period. Zheng 
and colleagues [24] developed an analysis method to discover the 
user water behavioral habits, in their invention, a novel 
continuous event segmentation algorithm based on threshold 
optimization was created to automatically separate the water 
usage records into multiple individual bath events for each user, 
this study employed a similar method to create features from 
teachers’ action record data sets. In the event segmentation 
algorithm created by Zheng et al., a threshold of time difference 
has been used to determine whether consecutive action records are 
in a same event. The algorithm consists of following steps: 1. 
Compute inter-action intervals; 2. Compare every interval to the 
threshold of time difference. In step 2, If the interval is smaller, 
these two inter-actions are considered in a same event, if the 
interval is greater, they are divided into two different events. The 
algorithm will run through all of inter-action intervals, then we 
can obtain individual events from action log sets. An automatic 
threshold optimization model was developed to search the optimal 
threshold value to segment event. 


The threshold optimization of each teacher in one week consists 
of following steps: 1. Segment events with successively varying 
thresholds, a fixed time delta d is set between two successive 
thresholds, we consider this threshold set in chronological order 
such that 


TS = {ts,,tsz,ts3 ...} (3) 


2. Compute event number y for each threshold ts; 3. Specify 
minimum rate of event numbers’ change for optimal threshold 
detection. In step 3, optimization algorithm uses a sliding window 
with a fixed size. The window can only contain n points, 
beginning at the current point and ending right before the next 
identified point. The optimization tries to find a possible starting 
point which is followed with a sequence of almost unchanged 
points. Suppose the threshold of the current point is ts;, the 
average rate of event numbers’ change cr is defined as follows: 


cr(ts;) = why Pe 


(4) 
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the final optimal threshold can be selected from given threshold 
set as follows: 


ot(TS) = Argmin,(cr(ts;)) (5) 


Figure 3 shows an example of an event segmentation with varying 
thresholds. Here, the number of events declines rapidly when 
threshold is smaller than 10 minutes, it implies most inter-action 
intervals of the teacher are smaller than 10 minutes. And there is a 
significant possibility to separate an individual event into two or 
more sub-events if a small value is determined as threshold. 
Therefore, an interval value is more rational to determined as 
threshold until the number of individual events touches down and 
levels off at almost zero. The slopes of inter-thresholds are used to 
detect the signal of change rate. When the average of n (In this 
case, n is set to 8) consecutive slopes of inter-threshold are closet 
to zero, the first threshold point in sliding window is flagged as 
optimal threshold value of an individual teacher’s inter-event 
interval in a week. In Figure 2, the point of 26 minute is possible 
the optimal threshold. 
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Figure 3. A sliding window which searches an optimal 
threshold point. Suppose that n = 8, the point of 26 minute is 
possible the optimal threshold 


3.2 Creation of Features 

We employed event segmentation algorithm in both teachers’ and 
students’ action records. The resulting segmented event dataset 
consists of 10,146 total event rows from 117,324 teachers’ 
authoring action records, and 23, 203 total teaching activity rows 
from 3,364,533 students’ response records. With the respect to 
trajectory visualization, a process of aggregation is performed in 
these events for frequency conversion and resampling by a week 
to generate time series data set. Eight features were distilled from 
processed dataset: 


The total duration of producing event (DE) — Event 
transformed from teachers’ action data which is about producing 
indicates the fact that they create new media resources and upload 
files with the authoring platform. The total duration of producing 
event allows us to know how long would teachers spend on 
preparing their lessons on the learning platform in a week. 


The number of long producing events (LE) — In order to 
minimize noise in the segmentation, we discretize events 
(exclusive of single-action events) into three buckets based on 
quartiles of durations of every events. The producing event with a 
duration longer than upper quartile is considered as long 
producing event. 


The number of middle producing events (ME) — The producing 
event with a duration longer than lower quartile and shorter than 
the upper quartile is considered as middle producing event. 


The number of short producing events (SE) — The producing 
event with a duration shorter than the lower quartile is considered 
as short producing event. 


The number of single-action producing events (SPE) — The 
events with only one single action are special in this case. A 
single-action event could be created in the situation where a 
resource producing last for a long time without any other 
neighboring action or just a testing action is performed. Therefore, 
we separate the single-action events into two groups by its action 
type. 


The number of single-action common events (SCE) — The 
event consists of only one single action which has not explicit 
relation with producing, such as creating a virtual folder with a 
default name, are considered as a common event with a single 
action. 


The number of teaching activities (TA) — Teaching activity in 
this study is about ‘consuming’ which indicates the evidence that 
teachers utilize the resources they’ve uploaded to the learning 
platform before class. With event segmentation, teaching activity 
is transformed from students’ concurrent response records which 
include answer submitting, media file uploading and help 
requesting. The tasks assigned inside e-textbook application by 
teachers are also considered as the teaching activity even they are 
mostly finished after class. 


The number of engaging days (ED) — The day that teacher is 
active in authoring platform is considered as an engaging day. 
However, the single-action common events are omitted when 
determining whether a teacher is active in a day. 


3.3, Contextual Numbers 

Self-organizing map is a nonlinearly projecting mapping 
algorithm which is introduced by Kohonen [7]. The earliest 
applications were in engineering tasks, later the algorithm has 
become a generic methodology, which has been applied in 
clustering, visualization, data organization, characterization, and 
exploration [6]. Self-organizing map consists of organized nodes 
that include a N-dimensional weight vector. In regard to the 
observations X = {x1,X2, ...,X,}in N-dimensional space x; € R”, 
the procedure can be summarized in three processes: competition, 
cooperation and self-adaptation. The SOM training algorithm can 
be thought of as a net which is spread to the data cloud. In general, 
it moves the weight vectors to make them span across the data 
cloud, so that the neighboring nodes get similar weight vector [7]. 


Traditionally, most applications of SOM algorithm were 
organized in a two-dimensional coordinate system (such as [2], 
[18]). In these applications, after projecting the data to SOM grid, 
the indexes of nodes as single values are able to create a new 
contextual order, which can be used to transform each high- 
dimensional point to a new computational space. The close points 
are similar in this context, however, this similarity is not 
interpretable in a single dimensional arrow comparing with classic 
number space [15]. 


In this regard, a one-dimensional SOM called contextual numbers 
was introduced by Moosavi [14], this method can be seen as a 
sequence of ordered numbers pointing to a high-dimensional 
space, these numbers are ordered according to their similarities 
within the selected high-dimensional state space or context. In 
contextual numbers, K nodes will be produced in one-dimension 
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after the mapping of SOM with X, and each node with an attached 
high-dimensional weight vector represents the original 
information. Instead of using the values within the nodes, a series 
new contextual orders were created. It can be summarized in the 
following steps: 1) Calculate the posterior probability of assigning 
contextual number; 2) Select the corresponding number when the 
posterior probability reaches the dominant peak as the node index. 
The difference between the two-dimensional and the one- 
dimensional can be reflected in the relation of indices and the 
weight vector. In a two dimensional grid, the neighborhood 
similarity expands in two directions. Therefore, there is no direct 
correlation between the numerical values of indices and the 
similarity of their weight vectors. But in one dimensional grid, 
valuable property of contextual numbers is that there is a direct 
correlation between indices [14]. As in the most two-dimensional 
cases, the final index of trained SOM will not be used directly as a 
numerical value but instead of assigned weight vector, contextual 
numbers allow us to create a continuous number space converted 
from a high dimension space, which can fit completely to a 
univariate space [15]. In terms of usage time series analysis in this 
study, we can have a univariate usage time series for each teacher 
along the week by conversion of contextual numbers. 


It should be noted that the index we mapped to each node is not 
the classic numbers. The value of these indices are not means the 
performance grades, but the similarity of two or more nodes. If 
two index have close values (e.g. node number 1 and node 
number 2 in SOM network. Numbering is arbitrary, but we 
usually start from upper-left corner and go row by row) they are 
similar in this context [15]. 


3.4 The second staged clustering 

With the indices (contextual numbers), hierarchical clustering is 
performed in this part. One advantage of hierarchical clustering 
algorithms is that it can help with the interpretation of the results 
by creating meaningful taxonomies. On account of these numbers 
implicate contextual information which is difficult to interpret, a 
common two-staged clustering is employed to combine most 
similar indexes, as what the previous applications did to the nodes 
of two-dimensional SOM grid (such as [22,16]). Then a typology 
from clustering results is developed, which is also proven to make 
it more accessible when stockholders are involved in exploration 
of data using visual inspection [5]. 


In order to get good performance of clustering, first we employ 
the k-means and the intrinsic metrics—within-cluster Sum of 
Square for Error (SSE) to compare the performance of different 
number of clusters. Based on the within-cluster SSE, the elbow 
method is used to estimate the optimal number of clusters k for a 
given task. In this study, the elbow is located at k = 5, thus we 
choose it as the number of clustering. Finally, we perform 
hierarchy clustering on the contextual numbers. 


4. RESULT 


This section presents the two stages of our research: in the first 
part the high dimensional observations from the processed time 
series data are converted to corresponding contextual numbers, a 
series of continuous indices and a specific typology which is built 
for interpretation; in the second part, we apply this to produce 
visualizations on teacher ICT use trajectory. 


4.1 Usage Typology 


Firstly, a SOM network is trained on a single dimension network 
with the eight-dimensions usage data, and the range of indexes is 
set from 0 to 29. Therefore, each index node has two neighbors 
except the first and the last. In this regard, we apply the second 
staged clustering to discretize the contextual number indexes into 
groups for interpretation, and it is determined that there are five 
groups to be discovered in our study. The details of the groups are 
shown in Table 1. 


As can be seen in Table 1, Group A characterizes the Limited use 
pattern. Teachers in this group have spent very few time on using 
the authoring platform. Few product indicates that they never 
upload media resources; Meanwhile, they organize a few 
activities once a week, which illustrates the technology is seldom 
used in their classes; The usage of this group usually is performed 
at the beginning or the end of semesters. 


Group B characterizes the Early use pattern. The teachers in this 
group organize even fewer activities than the teachers in Group A. 
But they have at least a middle or a short producing event a week, 
which means some resources were produced to prepare for the 
lesson., they try to use the platform to prepare lessons. We find 
out usage of that this group is the mainstream during the first three 
semesters. 


Group C characterizes the Consuming use pattern. Teachers in this 
group begin to use the learning platform more frequently than 
Group A and Group B. They are very willing to implement this 
application to organize teaching activities and usually have plenty 
of responses on the e-textbook, but they only produce at most 
once a week. We can also find that they have highest single-action 
common actions than teachers who are in other groups, since they 
tend to consume the resources rather than produce. 


Group D characterizes the Moderate use pattern. The teachers in 
this Group begin to frequently produce resources on the platform, 
many of them would use the learning platform three out of five 
working days for every week. Compared to those three groups we 
mentioned before, teachers in this group are actually using this 
technology to plan lessons with the resources which are built by 
themselves. As they are producing frequently, we find that they 
have highest mid-events. But compared to teachers in Group C, 
they have slightly less activities which means they are not relying 
on the e-textbook to teach in classes like teachers in Group C do. 


Group E characterizes the /ntensive use pattern. The teachers in 
this Group usually heavily produce resources during a long time, 
they produce many resources on the platform. Among the five 
working days each week, they almost produce everyday, they also 
organized numerous activities that means they are actually use the 
application a lot in class. 


Therefore, we can build some meaningful names and stories for 
every group and create fictitious typology labels to the contextual 
number indexes, in order to provide an easy way to understand the 
contextual meaning of indexes. As shown in Table 2, we 
summarize each group, giving the key characteristics and the 
indexes belong to. 
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Table 1. Grouping results showing the mean value for each 
feature and cluster 


Table 2. The user typology derived from two-stage clustering 


Group | Indexes Name Typology Label 
Group 
A B Cc D E Almost no product 
Limited Earl nsuming | Moderate | Intensiv: rae A few activities once a week 
Name use i ae 5 “ ite : A 1 Linnie Re Centralized in the beginning or end of| 
Index 0~5 6~14 15~19 20~25 26~29 er ne 7 
ew teaching activities 
rE ih ao oe ® oy ar oa B 6-14 Early use a least a middle/short event a week 
e mainstream of the earlier stage 
ME 0.000 0.692 0.742 2.597 2.012 Plenty of activities 
SE 0.029 0.371 1.000 0.827 0.514 C_ | 15-19 |Consuming use| _ Producing at most once a week 
SPE 0.206 0.432 0.327 0.931 0.452 More independent actions 
SCE 0.531 0.532 2.336 1.743 2.218 Frequently producing 
TA 6.396 2.883 21.107 8.560 32.863 D_ | 20-25 | Moderate use Highest middle-event rates 
ED 0.025 1.065 1.408 2.866 4.174 Slightly less activities 


aewa Teacherl 
BG Teacher2 


Moderate use (25) }] *=* Teacher3 
@~we Teacher4 | 2 


Intensive use (29) 


Consuming use (19) 


Early use (14) 


Limited use (5) 


Heavily producing during a long time 
Almost producing everyday 
Organizing numerous activities __| 


E_ | 26-29 | Intensive use 


Unused (0) 
1st Week 18th Week 38th Week 55th Week 76th Week 93th Week 
Figure 4. Sample trajectory visualization 
4.2 Usage Traj ectory to partition the teachers in terms of the variations along the two 


Finally, we can explore visual trajectory with the typology to 
identify the implicit patterns and hypothesis. This visualization 
provides the capability to trace states and discovery patterns 
without reducing the information to simple statistics, it illustrates 
the teacher usage trajectories which is helpful as teachers and 
stockholders rarely trace the process of how they use the ICT in 
teaching. 


As shown in Figure 4, Y axis is the index of one dimension SOM 
and X axis shows the week which is the length of time to be 
observed in this case. The figure shows the states and trajectories 
of each teacher over the time. Therefore, similar teacher has 
similar index number during the time. It allowed us to identify 
how a teacher uses this technology by comparing the trajectories 
and pattern of each teacher in relation to the others using the 
contextual numbers of SOM. If we are familiar to a few teachers' 
usage, we can consider these teachers as contexts for relative 
positioning when identifying a new teacher usage, even we don't 
know the interpretation of the contextual numbers. As shown in 
Figure 4, we can consider Teacher 3 as a template if we were 
familiar with the his or her usage or performance, then the usage 
of Teacher 4 is easy to be identified by comparing their similar 
trajectories. The result of our statistical analysis on index set 
shows that the Teacher 3 and Teacher 4 have a lowest Euclidean 
distance. On the other hand, we can also automatically find 
similar teachers based on distance calculation between each 
trajectories. 


As the use of an "adopted" technology can vary greatly from one 
group of teachers to another [9], this figure provide an easy way 


dimensions of contextual index and time of usage. In this case, as 
shown among the intense user group, Teacher 3 and Teacher 4, 
the contextual numbers indexes mostly ranged from 10 to 29, 
which were almost consistently higher than the the indexes of 
moderate user group, Teacher 1 and Teacher 2, whose usage was 
mostly labeled as early use or consuming use in the first three 
semesters. Apparently, Teacher 1 and Teacher 2 adopted this tool 
for teaching, but did not rely on the tool in the same way that 
Teacher 3 and Teacher 4 did. However, it is not rational to 
evaluate the performance of teachers’ ICT with the number of 
index, because the SOM indexes are used as computable numbers 
to represent the state based on the contexts, but the values of 
indexes don't follow the concept of natural numbers which can be 
interpreted as ordered grades. Therefore, the higher index does not 
always indicate better performance, even though it seems that 
higher contextual number index is labeled with more intensive use 
in this case. 


This method is also able to indicate potential patterns from 
trajectories of contextual numbers. As shown in Figure 4, the state 
of teacher's usage fluctuates visibly over each semester. More 
specifically, as we can see Teacher 4 in the last semester, at 
beginning of this semester the number of state stands at a limited 
usage index. Then, the number shoots up over the next two weeks, 
peaking at 29, which means a state of intensive use. After that, the 
contextual number declines rapidly for two weeks, bottoming out 
at 16 which is labeled as a consuming using index. The next week 
experiences a very sharp rise, reaching the intensive use area 
again. According the indexes of usage in the following weeks, a 
total of 5 peaks can be respectively detected. The peak pattern 
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discovered from trajectory plotting describes a behavior that 
teacher tends to produce the teaching resources intensively in first 
one week, then consume them in this week and the following one 
to two weeks. We apply frequent sequence mining to segmented 
trajectory data of active teachers to explore this idea, the result 
shows that the sequences of peak pattern (such as Sequence 
[Consuming use, Intensive use, Consuming use]) all get highest 
frequency in the group of their week length. 


5. CONCLUSION 


This paper introduced a computation procedure for visualizing the 
trajectories of teacher’s ICT usage based on the resource 
producing process and the experience structure via the implicit 
patterns within the raw data by event segmentation and contextual 
numbers. The resulting visualization provides the capability to 
trace states and discovery patterns without reducing the 
information to simple statistics, such automated visual 
characterization might be helpful to the wide and scalable 
application of teaching analytics to represent teacher’s ICT usage. 
Our future work will be oriented to the spatiotemporal dynamic in 
education, especially the application of ICT, in which the 
knowledge extraction of web-based education system can be 
viewed as a formative evaluation technique. In this condition, 
high-dimensional time series with different features can be 
replaced by a series of contextual numbers, where this numerical 
numbers can be embedded in any data driven analysis and 
prediction [14]. 
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